GC bug fix for `max_collect_interval` computation #45727

kpamnany · 2022-06-17T20:49:32Z

Currently, max_collect_interval is constrained to totalmem / ncores / 2 for _P64 which results in a very short collect interval when you're running with a smaller number of threads on a machine with many cores.

Change this to totalmem / nthreads / 2 which, for two of our tests, resulted in 40% and 60% runtime reduction (!!) as well as
GC time reduction from 46% to 10% and 64% to 11%.

Spotted by @janrous-rai.

Sacha0

Thanks Kiran! :)

janrous-rai · 2022-06-17T22:32:02Z

Wait. I think you need to make sure that jl_n_threads is properly initialized. It's done after GC setup by default.

…

On Fri, Jun 17, 2022, 15:39 Sacha Verweij ***@***.***> wrote: ***@***.**** approved this pull request. Thanks Kiran! :) — Reply to this email directly, view it on GitHub <#45727 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ATBN3JQK5DUJWIBO4C3UK4LVPTWCBANCNFSM5ZDMCCEQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

janrous-rai · 2022-06-17T22:32:33Z

Or at least this was the case in 1.7.1

…

On Fri, Jun 17, 2022, 16:31 Jan Rous ***@***.***> wrote: Wait. I think you need to make sure that jl_n_threads is properly initialized. It's done after GC setup by default. On Fri, Jun 17, 2022, 15:39 Sacha Verweij ***@***.***> wrote: > ***@***.**** approved this pull request. > > Thanks Kiran! :) > > — > Reply to this email directly, view it on GitHub > <#45727 (review)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ATBN3JQK5DUJWIBO4C3UK4LVPTWCBANCNFSM5ZDMCCEQ> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> >

kpamnany · 2022-06-21T16:35:38Z

It's done after GC setup by default.

Not sure exactly when it changed, but GC initialization happens after threads are initialized.

kpamnany · 2022-06-21T16:41:34Z

The tester_linux32 error is an OutOfMemoryError. Retrying now, but might this be being caused by this PR? @oscardssmith?

Currently constrained to `totalmem / ncores / 2` for `_P64` which results in a very short collect interval when you're running with a smaller number of threads on a machine with many cores. Changes this to `totalmem / nthreads / 2` which, for two of our tests, resulted in 40% and 60% runtime reduction (!!) as well as GC time reduction from 46% to 10% and 64% to 11%.

oscardssmith · 2022-06-21T16:52:10Z

If this PR causes OOMs, something else is broken. @chflood any thoughts?

gbaraldi · 2022-06-21T17:57:52Z

32 bit likes to OOM a lot, or more specifically run out of address space so it could be something else.

kpamnany · 2022-06-21T18:28:08Z

tester_linux32 passed on the retry so I think this is good to go.

KristofferC · 2022-07-04T09:31:46Z

Trying to run with this backported to 1.8 I get:

Floating point exception (core dumped)

I guess jl_n_threads is not initialized?

cc @kpamnany

kpamnany · 2022-07-04T13:01:36Z

I'll check. Should I open a PR to a particular branch?

KristofferC · 2022-07-04T13:24:57Z

backports-release-1.8

* Bug fix for `max_collect_interval` computation (#45727) Currently constrained to `totalmem / ncores / 2` for `_P64` which results in a very short collect interval when you're running with a smaller number of threads on a machine with many cores. Changes this to `totalmem / nthreads / 2` which, for two of our tests, resulted in 40% and 60% runtime reduction (!!) as well as GC time reduction from 46% to 10% and 64% to 11%. * Move GC init after threading init To allow use of `jl_n_threads` in GC initialization.

Currently constrained to `totalmem / ncores / 2` for `_P64` which results in a very short collect interval when you're running with a smaller number of threads on a machine with many cores. Changes this to `totalmem / nthreads / 2` which, for two of our tests, resulted in 40% and 60% runtime reduction (!!) as well as GC time reduction from 46% to 10% and 64% to 11%.

kpamnany added the GC Garbage collector label Jun 17, 2022

kpamnany requested review from chflood and oscardssmith June 17, 2022 20:49

oscardssmith approved these changes Jun 17, 2022

View reviewed changes

KristofferC added the backport 1.8 Change should be backported to release-1.8 label Jun 17, 2022

Sacha0 approved these changes Jun 17, 2022

View reviewed changes

oscardssmith merged commit c4c36ed into JuliaLang:master Jun 21, 2022

kpamnany deleted the kp/fix-maxmem branch June 21, 2022 18:32

KristofferC mentioned this pull request Jul 5, 2022

Backports for 1.8-rc2/1.8.0 #45491

Merged

36 tasks

KristofferC removed the backport 1.8 Change should be backported to release-1.8 label Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC bug fix for `max_collect_interval` computation #45727

GC bug fix for `max_collect_interval` computation #45727

kpamnany commented Jun 17, 2022

Sacha0 left a comment

janrous-rai commented Jun 17, 2022 via email

janrous-rai commented Jun 17, 2022 via email

kpamnany commented Jun 21, 2022

kpamnany commented Jun 21, 2022

oscardssmith commented Jun 21, 2022

gbaraldi commented Jun 21, 2022

kpamnany commented Jun 21, 2022

KristofferC commented Jul 4, 2022

kpamnany commented Jul 4, 2022

KristofferC commented Jul 4, 2022

GC bug fix for max_collect_interval computation #45727

GC bug fix for max_collect_interval computation #45727

Conversation

kpamnany commented Jun 17, 2022

Sacha0 left a comment

Choose a reason for hiding this comment

janrous-rai commented Jun 17, 2022 via email

janrous-rai commented Jun 17, 2022 via email

kpamnany commented Jun 21, 2022

kpamnany commented Jun 21, 2022

oscardssmith commented Jun 21, 2022

gbaraldi commented Jun 21, 2022

kpamnany commented Jun 21, 2022

KristofferC commented Jul 4, 2022

kpamnany commented Jul 4, 2022

KristofferC commented Jul 4, 2022

GC bug fix for `max_collect_interval` computation #45727

GC bug fix for `max_collect_interval` computation #45727